Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker: Improvements #2720

Merged
merged 7 commits into from
Jan 29, 2021
Merged

Conversation

RunDevelopment
Copy link
Member

This makes a bunch of improvements to the Docker language definition.

The main change is that commands are now parsed as one token. This makes it easier to highlight specific parts of commands (like options) and produces fewer false positives than the previous method. The only downside is that it is not easy to match a command whole so the pattern is a little more complex.
I ran the new Docker grammar against some long and complex Dockerfiles from DockerHub and was not able to find any problems.

Other changes:

  • Some commands have options. Those are now highlighted.
  • Variables are now supported.
  • The AS keyword of FROM is now supported.
  • "Punctuation" was removed. I honestly do not have the slightest idea where this pattern came from. The Dockerfile doc doesn't mention anything in that regard. My best guess is that this pattern was supposed to highlight bash punctuation but this isn't a good idea because the argument may not be a bash command.

@RunDevelopment RunDevelopment added language-definitions needs review dependencies Pull requests that update a dependency file labels Jan 17, 2021
@github-actions
Copy link

github-actions bot commented Jan 17, 2021

JS File Size Changes (gzipped)

A total of 3 files have changed, with a combined diff of +325 B (+19.2%).

file master pull size diff % diff
components/prism-docker.min.js 303 B 701 B +398 B +131.4%
components/prism-elixir.min.js 833 B 761 B -72 B -8.6%
components/prism-http.min.js 559 B 558 B -1 B -0.2%

Generated by 🚫 dangerJS against 5af096e

@edukisto
Copy link
Contributor

@RunDevelopment, just a small remark. I think, AS is not technically a Docker instruction, because

...each instruction is run independently, and causes a new image to be created...

For example,

FROM alpine AS os

takes 1 step and generates 1 image.

sudo docker build .
sudo docker images

@edukisto
Copy link
Contributor

I mean, AS is OK, except the name. It’s confusing.

Oh, there is also a HEALTHCHECK CMD expression.

@RunDevelopment
Copy link
Member Author

You're right, the name isn't very fitting. I've been thinking and the name "instruction" isn't even correct for FROM, CMD, etc. because "instruction" refers to the whole thing, so instruction keyword + argument. FROM and so on are the keywords that start a specific instruction.

It might be most fitting to rename the command to instruction and to rename the current instruction to keyword. What do you think @edukisto?


HEALTHCHECK is a good point. Seems like I also missed ONBUILD.


Also, thanks for looking at this! Reviews really help!

This changes the usage of "instruction" and "keyword" as discussed. It also adds support for some missing keywords.
@edukisto
Copy link
Contributor

I need some more time to answer.

@edukisto
Copy link
Contributor

edukisto commented Jan 22, 2021

The first part. Thoughts on the command alias.

I disagree with this part, because ADD, ARG, CMD, ENTRYPOINT, RUN, etc. are not called commands in the docs.

I try to draw parallels between Dockerfile and assembly languages. ADD, ARG, CMD, etc. are similar to mnemonic identifiers of operations (see Intel 80386, sections 1.3.3 and 2.4). We can find mentions of operations (“ADD operation”, “COPY operation”) in the Dockerfile reference. Mathematical and logical operations are identified by operators, so I had an idea to propose the operator key. However, operator needs operands, but the format of the Dockerfile and corresponding BNFs contain another term, namely “argument”. Functions have arguments. Operation is a special case of function.

That’s why I suggest the function key. The Intel 80386 manual associates instructions with underlying CPU functions, too (see section 4.2).


Docker defines a command as a command-line command. This is evidenced by BNFs. See CMD command param1 param2, ENTRYPOINT command param1 param2 , HEALTHCHECK [OPTIONS] CMD command, RUN <command>.

Some quotes:

Missing command after HEALTHCHECK CMD (from an error log).

The docker build command builds an image... The docker run command line arguments...

...whitespace in instruction arguments, such as the commands following RUN, are preserved...

...the echo command... ...the final pwd command...

Examples.

  1. RUN echo 'Hello, world!'

    • RUN is an operation ID and a keyword;
    • echo 'Hello, world!' is an instruction argument and a command. Thus, a command is a string beginnings with a name of (or a path to) a CLI executable and ending with actual arguments;
    • echo is an executable;
    • 'Hello, world!' is a parameter of the executable. To be quite honest about it, parameter is not an entirely correct term, because parameter is a formal argument, whereas 'Hello, world!' is a value of the actual argument.
  2. RUN ["echo", "Hello, world!"]

    The only difference is ["echo", "Hello, world!"], which is an instruction argument, but not a command.

@edukisto
Copy link
Contributor

[An] instruction refers to the whole thing...

I agree!

@edukisto
Copy link
Contributor

edukisto commented Jan 22, 2021

The second part. Thoughts on FROM...AS and HEALTHCHECK CMD.

FROM --platform=linux/amd64 python:latest AS bleeding_edge_python

  • FROM is an operation ID and a keyword;
  • --platform is a flag;
  • linux/amd64 is the flag value;
  • python:latest, AS, bleeding_edge_python are instruction arguments;
  • moreover, AS is a keyword, although not a reserved word. I agree with this key for AS.

Let me explain why I consider AS an argument of FROM. Both

# An alias is missing!
FROM python:latest AS

and

# AS is missing!
FROM python:latest bleeding_edge_python

give the following error message:

FROM requires either one or three arguments

Therefore, AS is the second argument of the three-argument form. This is the best evidence.

CMD and NONE in HEALTHCHECK are arguments, too.

@edukisto
Copy link
Contributor

It looks like I agree with every existing key except command and operator. AFAIK, a line continuation character is just an escape character, so the char key is more appropriate than the operator key.

Also, thanks for looking at this!

You are welcome!

@RunDevelopment
Copy link
Member Author

RunDevelopment commented Jan 23, 2021

First of all, thank you very much for such a detailed explaination.


The first part. Thoughts on the command alias.

I disagree with this part, because ADD, ARG, CMD, ENTRYPOINT, RUN, etc. are not called commands in the docs.

Agreed. I thought that command was previously used but nope, it wasn't. My bad. I'll remove it.

That’s why I suggest the function key.

I don't think that's a good idea, purely for practical reasons.

First of all, from your writing, it's not clear to me whether you suggest this also for the instruction rule or the keyword rules. Though it matters not because it's not a good idea in either case but for different reasons.

This instruction rule should not have a function alias because then the whole instruction will be highlighted in one color. This docker file can only contain comments and instructions, all text will be given a color different from the default text color. This is simply too much coloring.

The keyword rules should not have a function alias because of how themes implement coloring. Themes do not expect a token to have two supported class names.
Example: The default theme defines a color for keyword tokens and function tokens. If a token has both classes, the color of the function token will be used because the CSS function rule was defined after the keyword rule. Other themes may define these rules in a different order, so highlighting between themes will likely not be consistent.

Maybe we could add something like instruction-name for those keywords?


The second part. Thoughts on FROM...AS and HEALTHCHECK CMD.

You gave a very good explanation. I still think that they should be highlighted as keywords for two reasons:

  1. They are mentioned as literal strings in the BNF grammar of their respective instructions.
  2. It looks better. Syntax highlighting is supposed to help humans read code and I think that highlighting them as keywords helps to achieve that goal. While it may not be an accurate categorization, it is beneficial.

a line continuation character is just an escape character

No, it isn't. Line continuations are removed. This is indirectly mentioned in the "Note on whitespace" in the format section.

Let's look at the following docker file:

FROM debian:buster

RUN echo "\
hello\
world"

It will result in the following output:

C:\fake\path>docker build .
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM debian:buster
 ---> e7d08cddf791
Step 2/2 : RUN echo "helloworld"
 ---> Running in aa37a128ebf0
helloworld
Removing intermediate container aa37a128ebf0
 ---> 969127bc2810
Successfully built 969127bc2810

As you can see, the RUN instruction is parsed as RUN echo "helloworld".

This actually goes even further with the now deprecated empty continuation lines. The following docker file is equivalent to the above one:

FROM debian:buster

RUN echo "\


hello\


world"

While the name operator is debatable, I did choose it for practical reasons. operator is widely supported by our themes and makes it clear that \s have a very special meaning in arguments.

@edukisto
Copy link
Contributor

First of all, from your writing, it's not clear to me whether you suggest this also for the instruction rule or the keyword rules.

That’s true. To be clear:

FROM python:latest AS bleeding_edge_python

generates

<span class="token instruction command">
  <span class="token keyword">FROM</span>
  python:latest
  <span class="token keyword">AS</span>
  bleeding_edge_python
</span>

The instruction and these keyword’s are highlighted absolutely correctly. The only thing I disagree is command. I’ve been thinking about making the whole instruction a function instead of a command. It would imply a function (operation) call.

This instruction rule should not have a function alias because then the whole instruction will be highlighted in one color.

Sad, but true :(

The keyword rules should not have a function alias...

Yes, I’ve been thinking about making FROM an operator instead of a keyword, but it’s not an operator. I’ve been thinking about making FROM a function instead of a keyword, but it’s not a function itself, but a function/operation name in a call. I gave up these ideas yesterday, but didn’t explain it clearly enough.

Maybe we could add something like instruction-name for those keywords?

Marking of FROM, HEALTHCHECK, etc. with token instruction-name keyword classes is a good idea!

I still think that they should be highlighted as keywords...

Yes, AS in FROM, CMD and NONE in HEALTHCHECK are definetely arguments and keywords. I don’t suggest any changes in this part since 22cfc3b. I just confirmed that keyword is the right choice.

a line continuation character is just an escape character

No, it isn't.

The docs say it is :)

The escape directive sets the character used to escape characters in a Dockerfile.

By the way, we can redefine the escape character! A backtick (`) should be added here.

A single backslash (or a single tick) escapes a single newline and doesn’t affect anything else. The behaviour is similar to the line continuation escape character in the POSIX shell language:

The <backslash> and <newline> shall be removed...

For example,

# escape=`
FROM alpine
COPY `t`
`e`
`s`
`t`
`.`
`t`
`x`
`t .

makes Docker search for test.txt. So I can’t see any difference between escaping newlines and escaping other characters.

This actually goes even further with the now deprecated empty continuation lines...

Well, this is a good point. It makes backslash/backtick work as an “empty line removing” operator.

@RunDevelopment
Copy link
Member Author

a line continuation character is just an escape character

No, it isn't.

The docs say it is :)

Ahh, my bad. I read "escape character" as "character escape".

Yes, while the line continuation character is also the normal escape character, a line continuation is not a character escape. So I think it deserves special highlighting.

By the way, we can redefine the escape character!

I know and I hate it.

A backtick (`) should be added here.

Not just there. The entire instruction parsing has to be adjusted. It sounds dramatic but it's really just replacing all \ characters with a [\\`] character class, so it's quite simple.

But I don't know whether we should actually do that. Doing the simple replacement means that we will get false positives for either escape mode. Rn, we don't support backticks but backslashes are handled correctly.

I have never seen backticks being used in the wild and deemed them as unimportant. Their expressed purpose is to help writing docker files for Windows VMs but I haven't found any Windows docker files on Dockerhub that actually use them. Do we have to support them?

@edukisto
Copy link
Contributor

...false positives for either escape mode.

Ah, I see... Now we have two syntaxes: dockerfile-backticks and dockerfile-backslashes.

Do we have to support them?

Well, Microsoft mentions the escape directive, but yes, it turns out that they consistently use forward slashes as a directory separator in their Dockerfiles.

The work is not worth it.

Thank you very much for an interesting discussion!

@RunDevelopment
Copy link
Member Author

Now we have two syntaxes

Exactly. Twice the work because of a feature nobody uses.


Thank you for the review and in-depth analysis!

@RunDevelopment RunDevelopment added enhancement and removed dependencies Pull requests that update a dependency file labels Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants